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SUMMARY 

These  notes  are  directed  at  the  newcomer  to  nonlinear  programing  for  whom  a 
thorough  understanding  of  Lagrange  multipliers,  the  Kuhn-Tucker  conditions  and  the  duality 
theorem  is  essential.  The  notes  attempt  to  explain  these  foundations  of  the  theory  and 
what  motivates  them.  Special  cases  of  one  or  two  distensions  are  considered  and  are 
extended  by  means  of  the  notation  of  vector  differentiation  to  the  case  of  n  variables. 
The  reader  is  taken  in  stages  from  the  problem  of  unconstrained  minimization,  through 
the  equation  constrained  problem,  to  the  general  constrained  problem.  The  important 
Jacobian  assumption  is  also  discussed. 
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1  INTRODUCTION 

These  notes  are  directed  at  the  newcomer  to  optimization  and  nonlinear  programing. 
Such  a  reader  is  confronted  with  a  bewildering  maze  of  conflicting  and,  in  the  author's 
opinion,  inadequate  notation.  This  is  perhaps  to  be  expected  in  one  of  the  newest  and 
most  rapidly  developing  branches  of  mathematics,  but  it  is  a  pity  because  the  foundations 
of  the  subject  can  be  made  to  appear  deep  and  subtle,  when  in  reality  they  consist  of 
simple  results  that  are  easy  to  derive. 

Optimization  is  concerned  with  the  problem  of  minimizing  a  function  of  several  (and 
often  many)  real-valued  variables.  If  the  variables  themselves  are  restricted  to  satisfy 
other  functional  relations,  the  problem  is  said  to  be  constrained..  It  should  also  be 
noted  that  if  we  are  able  to  minimize  a  function  f  then  we  can  also  maximize  the 
function  - f  ,  and  vice-versa. 

Nonlinear  programming  consists  largely  of  a  collection  of  algorithms  for  use  by 
a  computer  to  solve  optimization  problems  that  involve  nonlinear  functions.  These 
algorithms  are  always  iterative  and,  for  unconstrained  problems,  the  iterations  are 
designed  to  converge  to  points  that  satisfy  various  necessary  and  sufficient  conditions . 

In  addition,  for  constrained  problems  the  techniques  of  Lagrange  multipliers  and  the 
duality  theorem  are  required  to  help  ensure  the  iterations  converge  successfully. 

A  thorough  knowledge  of  these  foundations  of  optimization  theory  is  thus  essential  before 
algorithms  to  solve  practical  problems  can  be  written,  efficiently  implemented,  or  their 
results  meaningfully  interpreted. 

In  these  notes  we  try  not  only  to  explain  the  foundations  of  the  subject  but  also 
to  show  what  motivates  them,  in  the  hope  that  this  will  increase  the  beginner's  insight 
into  the  theory.  We  proceed  by  considering  the  special  cases  of  functions  of  one  or  two 
variables  and  use  geometrical  interpretation  to  aid  our  understanding.  The  results  thus 
obtained  are  then  extended,  by  means  of  the  notation  of  vector  differentiation,  to  the 
case  of  functions  of  n  variables,  where  the  reader  no  longer  has  a  geometrical  crutch 
to  rely  on.  The  results  obtained  for  the  n-dimensional  case  bear  a  striking  similarity 
to  those  for  the  simple  case.  It  is  hoped  that  this  similarity  will  help  to  further 
increase  the  reader's  understanding. 

Also,  these  notes  are  deliberately  structured  to  take  the  reader  in  stages  from  the 
comparatively  simple  problem  of  unconstrained  minimization,  through  the  equation 
constrained  problem  (sometimes  called  the  equality  constrained  problem)  to  the  general 
constrained  problem  (te  minimization  subject  to  both  equation  and  inequality  constraints). 
However,  it  is  shown  that  by  employing  the  concept  of  active  constraints  the  general 
constrained  problem  is  dealt  with  by  considering  it  as  an  equation  constrained  problem. 

The  important  Jacobian  assumption  is  also  explained  and  the  consequences  of  not 

o  assuming  it  to  hold  are  discussed. 

o 

0 

2  PRELIMINARY  THEORY 

u 

< 

In  this  section  we  mention  some  less  well-known  notation.  The  notation  is  adhered 
to  throughout  the  rest  of  these  notes.  The  reader  should  beware  since  other  authors  may 
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use  different  notation  or  they  may  use  the  same  notation  to  denote  different  or  even 
contradictory  statements. 

2. 1  Notation 

We  denote  the  column  vector  of  n  variables  by  x  .  In  these  notes, 

underlined  lower  case  letters  will  always  denote  column  vectors.  It  will  usually  be 
made  clear  in  the  text  whether  the  vectors  are  constants,  variables  or  vector  functions. 

Matrices  will  sometimes  be  denoted  by  capital  letters. 

By 

A  =  (a.  .)  or  a.  .  =  (A).  . 

ij  tJ  ij 


we  shall  mean  that  A  is  the  matrix  whose  i,j-entry  is  a^j  . 

Let  A  =  (a„)  and  B  =  (b^.)  be  two  m  x  n  matrices.  We  shall  write  A  <  B  if 

and  only  if  a. .  <  b. .  for  all  ij  . 

}  ij  ij 

Suppose  f  is  a  function  of  n  variables.  Instead  of  writing  f  =  f(Xj,...,x  ) 

we  shall  frequently  write  f  =  f(x)  and  say  that  f  is  a  function  of  x  .  Suppose 

£,,..., f  are  m  functions  of  x  .  We  can  write  this  as  f(x)  . 

1  m  -  -  - 

2. 2  Vector  differentiation 

By  vector  differentiation  we  mean  the  differentiation  of  a  function  with  respect 
to  a  vector.  Note  that  the  function  can  itself  be  a  vector. 

Let  f  be  a  function  of  n  variables 

*  f  pj  £ 

-r — —  all  exist  define 

dx  d  X 

1  n 

df  A 
dx 


x  .  Then  if  the  partial  derivatives 


I  3f  \ 

3x, 


3£ 


3x 

\  n 


where  the  symbol 


tr  means  that  the  left  hand  side  is  defined  by  the  right  hand  side. 


Note  that  many  writers  use  the  symbols  Vf,  V* f  or  grad  f  to  denote  vector  differen¬ 
tiation  (see,  for  instance  Luenberger',  Dixorr^  ) .  However,  with  their  notation  it  is 
sometimes  not  immediately  clear  which  vector  the  function  f  is  being  differentiated 
with  respect  to.  Also,  with  our  present  notation,  many  of  the  familiar  results  of 
scalar  differentiation  need  little  modification  when  extended  to  the  case  of  vector 
differentiation.  Thus  the  present  notation  is  a  useful  memory  aid  and  also  provides  good 
insight  into  how  results  are  extended  to  more  than  three  dimensions. 


A  few  writers  use  the  symbol  3f/9x  to  denote  vector  differentiation  (see 
3  ~ 

Intri lligator  ,  from  whose  notation  the  present  one  has  been  modified).  When  we  come  to 
extend  the  concept  of  partial  derivative  to  the  vector  case  we  shall  see  that  this  nota¬ 
tion  too  can  be  confusing  and  inadequate. 
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Let  f  be  an  m  *  1  vector  function  of  x  .  If  the  partial  derivatives 


3f 

3x. 

l 


3f,\ 


3x. 

2 


3f 

_ m 

3x. 

x 


1 » •  • » 9n 


exist,  then  define 


A 

dx  = 


8f, 

3f 

m 

3*1 

3X. 

• 

3f 
_ m 

3x 

3x 

&  is  called  the  Jaacbian  matrix  of  f  . 
dx 

The  definition  of  the  second  derivative  of  f  ,  where  f  is  now  a  scalar  function, 
logically  follows. 

We  define 


d2f  a 
dx2  = 


± 

dx  \dxj 


I  32f 


axjSXj 


32f 


3x  3x, 
n  1 


32f  ' 


3x.  3x 
1  n 


32f 


3x  3x  , 
n  n  / 


dff 

dx2 


is  called  the  Hessian  matrix  of  f  . 


i. 

V 


-1 
1  ' 


With  this  notation,  Taylor's  series  for  functions  of  n  variables  x  is  written 

2 

f(x  +  Ax)  »  f(x)  +  AxT  + Jj.  AxT  — j  Ax  +  . . . 

-  dx^ 

It  is  straightforward  to  give  df/dx  a  geometrical  interpretation.  The  gradient 
of  f  at  Xq  along  a  direction  v  is  defined  as 


i 


( ; 


00 

00 

00 


V 

< 


lim 

hM) 


f(xQ  +  hv)  -  f(xQ) 


J 
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Now  from  Taylor's  series. 


f(xQ  +  hv)  -  f(xQ) 


so  that  the  limit  is  v  —  ,  which  is  the  component  of  df/dx  along  v  .  From  element¬ 
ary  linear  algebra  we  know  that  this  is  greatest  when  v  lies  along  df/dx  .  Hence 
df/dx  is  the  gradient  of  f  along  the  line  of  steepest  slope. 

We  now  extend  our  notation  to  the  case  of  partial  differentiation  with  respect 
to  vectors.  From  the  theory  of  scalar  partial  differentiation,  if  f  is  a  function  of 
x ..... , XR  then 


df  =  dx.  ~  + 

I  dXj 


, .  +  3x  — 
n  3x 


,  T  df 


Suppose  now  that  f  is  a  function  of  two  vectors  x  and  jr  where  £  is  m  *  1 . 
Then 

,,  .  3f  .  3f  .  3f  .  3f 

df  =  dx  -r —  +  ...  +  dx  ~ —  +  dy,  -r —  +  ...  +  dy  r — 

1  3x,  n  3x  'I  3y,  's  3y 

1  n  1 1  ■'m 

,  T  3f  ,  T  3f 

=  dx  -r—  +  dy  —  , 

-  3x  3y 


where  we  use  curly  3  to  emphasise  that  differentiation  is  taking  place  with  respect 
to  only  one  of  the  possible  Vector  variables. 

The  concept  of  total  derivative  can  also  be  extended.  Suppose  the  vector  ^  is 
a  function  of  x  ,  If  we  keep  all  the  independent  variables  except  x^  ,  say,  fixed 
and  allow  x..  to  vary,  then  the  dependent  variables  ^  will  also  change.  The  total 
rate  of  change  of  f  will  then  be  given  by 


(JL)  =  (k)  ♦  (Hi)  JL  +  . 

vxi  J  Wi)  vxi )  *y\ 


x 


m  \  3f 


3 x.  /  3y 
1/ 
x 


/  3f  \ 

,  .  mT 

[dx.  j 
\  1  / 

1  Vs*  J 

(2-2-1) 


where  the  vector  suffixes  attached  to  the  derivatives  are  a  reminder  that  the  derivatives 
with  respect  to  x^  are  not  equal  -  the  x  indicating,  where  it  is  present,  that  all 
the  x  (except  x^)  are  kept  fixed,  the  y  indicating  that  all  the  dependent  variables 
are  kept  fixed.  We  then  define  the  total  derivative  of  f  with  respect  to  x  by 


df  A  3f.  +  fi  ii 

dx  “  3x  dx  3y 


(2-2-2) 


Ae  1888 
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Equation  (2-2-2)  is  of  course  obtained  by  repeating  (2-2-1)  for  i  =  l,...,n  and  writing 
the  result  in  vector  form. 

Many  of  the  familiar  standard  results  of  scalar  differentiation  can  be  extended 
in  a  modified  form  to  the  vector  case.  Some  of  these  results  are  used  in  subsequent 
sections.  They  are  stated  in  Appendix  A  for  the  reader's  convenience. 

2.3  Tangent  spaces  and  contours 

When  considering  functions  of  two  or  three  variables  we  can  use  our  geometrical 
intuition  to  give  us  insight  into  the  mathematical  problem.  This  is  reflected  in  the 
terminology  we  use.  We  say  that 

f(x,y,z)  =  0  (2-3-1) 

represents  a  surface  and  that  if  the  partial  derivatives  3f/3x  ,  3f/3y  and  3f/3z  are 

continuous,  then  the  surface  (2-3-1)  is  smooth.  The  vector 

/3f  3f  3f\T 
\3x  ’  3y  ’  3z  / 

is  called  the  normal  to  the  surface,  and  since  equations  of  the  form 

ax  +  by  +  cz  =  const 


represent  planes,  the  equation 

3f  3f  3f  3f  3f  3f 

X  ^  +  y  3?  +  z  3i  =  *0  +  *0  37  +  Z0  li 

T 

must  represent  the  tangent  plane  to  the  surface  (2-3-1)  at  the  point  (x^.y^Zg)  . 

Provided  that  the  surface  (2-3-1)  is  nowhere  perpendicular  to  the  (x,y)  plane 
(fe  3f/3z  is  nowhere  zero)  our  geometrical  intuition  tells  us  that  we  can  draw  contours 
of  (2-3-1)  onto  the  (x,y)  plane  of  the  form 

g(x,y)  »  const  .  (2-3-2) 

Algebraically  we  do  this  first  by  transforming  (2-3*- 1)  into 

h(x,y)  =  z 

(our  geometrical  intuition  suggests  where  this  might  not  be  possible)  and  then  sub¬ 
stituting  constant  values  of  z  to  obtain  a  family  of  contours  like  (2-3-2). 

When  our  problem  functions  are  of  more  than  three  variables,  we  no  longer  have 
a  geometrical  crutch  to  lean  on,  but  the  symbols  we  use  look  similar  and  so  we  employ 
a  similar  language.  We  say  that 

f(x)  -  0  (2-3-3) 


8 


defines  a  surface  whose  tangent  hyperplane  or  tangent  space  at  x^  is  given  by 

T  df  T  df 

x  —  =  x.  — 

-  dx  -0  dx 

where  df/dx  is  the  normal  to  (2-3-3). 

From  the  implicit  function  theorem  (see  Appendix  B),  provided  3f/5xn  is 
nowhere  zero,  we  can  rewrite  (2-3-3)  as 

g(Xj .... »xn_j )  =  *n  (2-3-4) 

which  we  interpret  as  a  family  of  contours  like 

g(x  , . . . ,x  )  =  const 

l  n- 1 

setting  x  =  0  we  obtain 
n 


g(x,,...,Vl)  =  0 

which  only  underlines  the  obvious  fact  that  the  terms  'surface'  and  'contour'  are  inter¬ 
changeable.  We  shall  use  the  term  'contour'  in  these  notes. 

Finally,  we  shall  find  it  convenient  to  define  a  path  from  some  starting  point  ’xQ 
to  some  endpoint  x  ,  say,  to  be  a  sequence  of  points  x^,  Xj ,  x,,,..,  which  converge  to  x. 

3  THE  UNCONSTRAINED  PROBLEM 

In  this  section  we  consider  the  unconstrained  minimization  problem 

0  minimize  f(x) 

x 


and  we  wish  to  obtain  the  necessary  and  sufficient  conditions  that  x*  be  a  solution 
of  U  .  We  derive  these  for  the  one-dimensional  case  first,  in  the  hope  that  this  will 
provide  insight  when  we  come  to  discuss  the  n-dimensional  problem. 

The  only  assumption  we  make  is  that  f(x)  is  continuously  twice  differentiable. 
This  by  no  means  restricts  the  scope  of  our  theory  since  all  practical  problem  functions 
can  be  approximated  by  polynomials  that  satisfy  our  assumption.  We  also  restrict  our 
definition  of  a  minimum  of  f  to  exclude  - «  .  This  is  not  only  convenient  for  us,  but 
it  also  reflects  the  fact  that  iterative  algorithms  would  fail  to  obtain  such  minima. 

3 . 1  The  one-dimensional  case 

We  are  interested  in  the  one-dimensional  problem 


Ae  1888 
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By  a  solution  of  U1  we  mean  a  real  number  x*  such  that 


f(x*  +  Ax)  >  f(x*) 


(3-1-1) 


for  all  small  enough  numbers  Ax  .  Notice  that  this  definition  implies  that  f(x*)  might 
only  be  a  looa.1  minimum  of  f  .  In  other  words,  there  may  be  some  other  numbers  x 
satisfying  f(x)  <  f(x*)  and  our  definition  only  ensures  that  they  cannot  be  near  to 
x*  .  In  particular  there  cannot  be  a  path  joining  x*  to  x  that  a  computer  algorithm 
might  follow  and  along  which  the  value  of  f  progressively  decreases. 

It  is  well  known  that  the  first  order  and  second  order  necessary  conditions  for  x* 
to  be  a  solution  of  U1  are,  respectively 


(x*)  =  0 


(x*)  >  0 


(3-1-2) 


(3-1-3) 


whilst  the  sufficient  conditions  are 


(x*)  >  0  . 


(x*)  =  0  and 


These  results  are  derived  from  Taylor's  theorem 


f(x*  +  Ax)  =  f(x*)  +  Ax  -j-  (x*)  +  -r-r  Ax  — x-  (x*)  +  0(Ax  ) 

dx  2!  .2 

dx 


(3-1-4) 


(3-1-5) 


Using  (3-1-5)  to  eliminate  f(x*  +  Ax)  from  (3-1-1)  and  taking  f(x*)  from  each  side 


2 

Ax  (x*)  +  jAx^  — j  (x*)  +  O(Ax^)  >  0 

dx 


Suppose  we  set  Ax  >  0  ,  then  division  of  (3-1-6)  by  Ax  gives 


(3-1-6) 


2 

(x*)  +  j Ax  — j  (x*)  +  O(Ax^)  >  0 
dx 


If  we  now  let  Ax  -*  0+  we  see  that  (3-1-7)  implies 


(3-1-7) 


I  <«*>  *  0 


(3-1-8) 


vmm 
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A  similar  process  for  Ax  <  0  gives 


^  (x*)  <  0  .  (3-J-9) 

(3-1-8)  and  (3-1-9)  can  only  be  both  true  if  ^  (x*)  =  0  and  so  we  have  the  first 

result  (3-1-2) . 

Now  eliminating  df/dx  from  (3-1-6)  by  using  (3-1-2)  leads  to 

2 

jAx2  M  <**)  +  0(Ax3)  >  0  .  (3-1-10) 

dx" 

> 

Dividing  (3-1-10)  by  Ax“/3  (which  is  always  positive)  leaves 


(x*)  +  0( Ax)  >  0  .  (3-1-11) 

dx" 

On  letting  Ax  -*■  0  we  see  that  — (  (x*)  >  0  .  This  is  the  second  order  necessary 

dx" 

com! i  t  i  on  (  1-1-3) . 

To  prove  the  sufficiency  conditions  (3-1-4)  we  assume  that  they  hold  and  show  that 
this  implies  f(x*  +  Ax)  5s  f(x*)  for  all  sufficiently  small  Ax  . 

Now  from  the  mean  value  theorem  and  the  second  mean  value  theorem,  we  know  that 
there  are  numbers  >"  ,n  £  |0,l|  such  that 


df 

dx 


(x*  +  5Ax) 


f (x*  +  Ax)  -  f(x*) 
Ax 


(3-1-12) 


and 


ell 


j  (x*  +  nAx) 


(tx 


4^  (x*  +  Ax)  -  ~  (x*) 
dx _ dx _ 

Ax 


(3-1-13) 


since,  by  hypothesis,  ~  (x*)  =  0  ,  (3-1-13)  implies  that 


77 <x*  +  nAx)  =  a7  M  (x*  +  Ax) 

dx 


(3-1-14) 


d2f 


Now  — j  (x*)  >0  means  that  df/dx  is  strictly  increasing,  at  least  near  to  x*  . 


dx 

Since  — ^  (x*)  =  0  and  f(x)  is  continuously  differentiable  we  must  have  from  (3-1-14) 
that 


4—  (x*  +  s; Ax)  >  0  for  Ax  >  0 

dx 


(3-1-15) 


At'  ISSN 


and 


for  Ax  <  0 


—  (x*  +  £Ax)  <  0 

dx 


(3-1-16) 


for  small  enough  Ax  . 

We  can  write  (3-1-15)  and  (3-1-16)  together  as 


Ax  —  (x*  +  £Ax)  >  0 


(3-1-17) 


but  (3-1-17)  is  just  the  left  hand  side  of  (3-1-12)  multiplied  by  Ax  .  Therefore,  the 
right  hand  side  of  (3-1-12)  multiplied  by  Ax  is 

f(x*  +  Ax)  -  f(x*)  >  0 


and  so  x*  is  a  solution  of  l’l 


3.2  The  n-dimensional  case 


The  necessary  and  sufficient  conditions  for  the  n-dimensional  problem  U  can  be 
derived  in  a  similar  manner.  For  the  reader's  convenience  we  state  them  first.  We 
stress  once  again  that  they  are  only  conditions  for  f(x*)  to  be  a  local  minimum  of  f  . 
The  first  order  and  second  order  necessary  conditions  for  x*  to  be  a  (local)  solution 
of  U  are  respectively 


~  (x*)  =  0 

dx  - 


(3-2-1) 


T  d~f  , 


x  >  0 


(3-2-2) 


for  all  small  enough  vectors  Ax  .  The  sufficient  conditions  are 


<*£  (x*) 
dx  ; 


0  and  Ax 


i 

T  d‘f 


~  Ax  >  0 


(3-2-3) 


The  reader  should  be  immediately  aware  of  the  similarity  of  these  conditions  with 
the  one-dimensional  case.  They  are  also  derived  in  a  similar  manner.  The  Taylor 
series  in  n  dimensions  gives 


f  (x*  +  Ax)  =  f  (x*)  +  Ax  -t—  (x  )  +  —  Ax  — (x*)  Ax  +  .  . , 
-  -  -  -ax-  2 

dx 


( 3—2—4) 


Writing  Ax  =  Axu  where  u  is  the  unit  vector  in  the  direction  of  Ax  and  Ax  is  the 
magnitude  of  Ax  ,  (3-2-4)  becomes 

2 

f(x*  +  Ax)  =  f(x*)  +  AxuT  ^  (x*)  +  -jy  ix‘’uT  — |  (x*)u  +  ...  .  (3-2-5) 

dx*" 

To  derive  the  necessary  conditions  we  assume  x*  is  a  solution  of  U  ,  ie 


f(x*  +  Ax)  >  f (x*) 

for  all  vectors  Ax  .  Substituting  this  inequality  into  (3-2-5)  gives 


(3-2-6) 
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0  <  Axu^  ^  (x*)  +  jAx^u^  — j  (x*)u  ♦  O(Ax^)  .  (3-2-7) 

-  dx 

As  in  the  one-dimensional  case,  we  divide  (3-2-7)  by  Ax  >  0  and  let  Ax  -*■  0+  to  obtain 

,T  (x*)  <  0  .  (3-2-8) 

If  we  now  return  to  (3-2-4)  and  make  the  substitution  Ax  «  Axu  where  u  is  now  the  unit 
vector  in  the  opposite  direction  to  Ax  ,  so  that  the  magnitude  of  Ax  is  -  Ax  ,  we  can 
obtain,  in  the  obvious  way 


'T  df  .  .. 

u  -r~  (x*) 
-  dx  - 


>  0 


(3-2-9) 


(3-2-8)  and  (3-2-9)  can  only  hold  if 


-T  df  , 

-  d^  (**>  = 


(3-2-10) 


If  we  let  u  run  through  the  co-ordinate  vectors  e^ 
implies  (x*)  =  0  for  all  i  ,  and  hence  4^  (**) 

oX.  CIX  — 

1 

necessary  condition  as  required. 

Because  (x*)  =  0  ,  (3-2-7)  becomes 


in  turn,  we  see  that  (3-2-10) 
=  0  .  This  is  the  first  order 


0  <  JAx‘uT  (x*)u  +  0(Ax~*)  . 

dxZ 

2 

Dividing  by  jiAx  and  letting  Ax  ■+  0  we  see  that 


(3-2-11) 


GT  (x*)u  >  0  (3-2-12) 

dx 

for  all  unit  vectors  u  and  hence  for  all  vectors  Ax  =  Axu  .  Thus  we  have  proved  the 
second  order  necessary  condition. 

All  that  now  remains  is  to  verify  the  sufficient  conditions  (3-2-3) .  From  the 
mean  value  theorem  in  n  dimensions  and  the  second  mean  value  theorem  in  n  dimensions 
(see  Appendix  B)  we  know  that  there  is  a  number  £  £  [0 , I 1  such  that 


Ax^  —  (x*  +  £Ax)  =  f(x*  +  Ax)  -  f(x*) 


(3-2-13) 


and  for  all  n  >  0  , 


Ac  188H 


1  3 


2 

^  (x*  +  nAx)  =  ^  (x*)  +  n  (x*)Ax  +  0(n2)  . 
-  dx 


(3-2-14) 


d  f  T 

Noting  Chat,  by  hypothesis,  (x*)  «  0  and  premultiplying  (3-2-14)  by  Ax  we  have 


2 

AxT  -|j—  (x*  +  nAx)  =  nAxT  — ~  (x*)Ax  +  0(n2) 

dx 


(3-2-15) 


The  right  hand  side  of  (3-2-15)  is  greater  than  0  (at  least  if  n  and  Ax  are  small 

2  2 

d  f  T  d  f 

enough)  since  — x  (x*)  is  positive  definite  tie  Ax  — x  (x*)Ax  >  0  for  all  Ax  4  0  . 
dx  dx 

again  by  hypothesis).  Therefore  the  left  hand  side  of  (3-2-15)  is  also  greater  than  0  , 

ie 


Ax^  (x*  +  nAx)  >  0 


(3-2-16) 


for  all  (small  enough)  n  >  0  .  By  considering  (3-2-13)  we  see  that  (3-2-16)  implies 
f(x*  +  Ax)  -  f(x*)  >0  .  We  have  thus  shown  that  the  conditions  (3-2-3)  are  sufficient 
for  x*  to  be  a  solution  of  t'  . 

3. 3  The  quadratic  unconstrained  problem 

We  end  this  section  by  discussing  the  special  case  of  U  when  f  is  a  quadratic 
function  of  x  .  The  quadratic  problem  is  important  because  many  practical  functions 
can  be  approximated  by  quadratic  functions,  at  least  close  to  their  minimum  x*  .  The 
well-known  least  squares  method  of  solving 


Ax  =  b 


by  writing 

£  =  Ax  -  b 

.  .  .  .  T  . 

and  minimising  c  c  is  an  example  of  a  quadratic  problem.  The  quadratic  problem  also 
serves  as  a  useful  illustration  of  the  general  case. 

The  general  quadratic  problem  is 

GQ  minimize ^f (x)  =  ix^Ax  -  b^x  +  cj 


There  are  several  simplifying  assumptions  that  we  can  make.  First  of  all  without  loss 
of  generality,  A  can  be  replaced  by  a  symmetric  matrix.  Secondly,  since 


min  f(x)  =  min{f(x)  -  c)  +  c 
x  x 


we  can  set  c  *  0  .  So  we  shall  consider  the  problem 
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T 


Q  minimize  ^f(x)  =  jx^Ax  -  b^xj 


V 


where  A  is  symmetric. 

We  derive  a  theorem  that  tells  us  something  of  the  conditions  under  which  the  problem  Q 
has  got  a  solution.  The  theorem's  implications  for  the  two-dimensional  case  are  then 
more  fully  explored  with  the  hope  that  the  discussion  will  increase  the  reader's 
understanding  of  the  theorem.  However,  we  first  introduce  some  preliminary  definitions 
(see  Kreyszig^*). 

Let  A  be  an  n  x  n  matrix. 


A  number  A  which  satisfies  the  equation 


Ax  =  Ax 


(3-3-1) 


for  at  least  one  non-zero  vector  x  is  called  an  eigenvalue  of  A  .  The  non-zero 

vectors  x  which  satisfy  (3-3-1)  are  called  the  eigenvectors  corresponding  to  A  .  It 

can  be  shown  that  if  A  is  a  real  symmetric  matrix  then  all  its  eigenvalues  A . ,A 

1  n 

are  real  (though  not  necessarily  distinct)  and  that  n  corresponding  orthogonal  eigen¬ 
vectors  X|,...lx  can  be  chosen. 

The  rank  of  a  matrix  is  the  number  of  its  linearly  independent  columns.  Let  A 
be  an  n  x  n  matrix  of  rank  r  .  Then  it  can  be  shown  that  A  has  exactly  n  -  r 
zero  eigenvalues. 

An  n  x  n  matrix  A  is  said  to  be  positive  definite  if 


T, 

x  Ax 


(3-3-2) 


for  all  non-zero  vectors  x  .  If  the  strict  inequality  sign  >  in  (3-3-2)  is  replaced 
by  ^  then  A  is  said  to  be  positive  semi-definite.  It  can  be  proved  that  if  A  is 
a  symmetric  n  *  n  positive  semi-definite  matrix  of  rank  r  then  n  eigenvalues 
Aj,...,An  exist,  exactly  r  of  which  are  positive  (but  not  necessarily  distinct)  and 
the  remaining  n  -  r  eigenvalues  are  all  zero.  It  should  be  clear  from  the  above  that, 
corresponding  to  the  A^  ,  n  orthogonal  eigenvectors  X|,...,x^  can  also  be  chosen. 

We  are  now  in  a  position  to  state  and  prove  our  theorem. 

Theorem  Let  A  be  a  positive  semi-def ini te  matrix  of  rank  r  .  Let  x, . x  be 

-  r  _I  -n-r 

the  orthogonal  eigenvectors  of  A  corresponding  to  zero  eigenvalues.  Then  the  problem 
Q  has  a  solution  if  and  only  if  the  vector  b  is  orthogonal  to  every  linear  combination 


f  . 

of  X, 

,,...,  X  . 

i 

-1 

1  -n-r 

Proof 

Since  A 

is  positive  semi-definite. 

there 

- 

T 

x.Ax„ 

=  0  . 

S'  j 

A 

-0  -0 

Therefore 

f(xQ)  = 

UT 

-  fe  x0 

t  ; 

-0 


6  , 


say. 


1 
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By  considering,  if  necessary,  -  Xq  we  can  set  8^0,  without  loss  of  generality. 

But  if  B  <  0  then  f(rx^)  =  r6  -+  - 00  as  r  -+  °°  .  Hence  Q  has  no  solution.  (Except 
for  the  excluded  case  f(x*)  =  -».)  Let  xn_r+ j , • • • >x^  be  the  remaining  orthogonal 

eigenvectors  of  A  with  positive  eigenvalues  ^n_r+j . X^  .  Now  we  may  write 

Xq  =  cijX|  +  ...  +  an2n  for  some  unique  a|,...,a  since  the  eigenvectors  form  a  basis. 

Therefore 


-0A-0 


x.(a  X  x  +  ...  +  a  X  x  ) 
-0  1  1  - 1  n  n-n 


T 

=  x_(a  X  x  .,+...+  a  X  x  ) 
-0  n-r+I  n~r+l-n-r+I  n  n-n 


since  X^ 


X 

n-r 


0  . 


t  It 

Therefore  xn Ax,.  =  [a  ,  x,  +  . . .  +  a  x  (a  ,X  x 

-0-0  \  ,'1  n-n /  n-r+1  n-r+1 -n-1 


. .  +  ...  +  a  X  x  ) 
r+ 1  n  n-n 


2  .  T  .  .  2,  T 

=  a  ,X  ,x  ,x  ,  +  ...+  a  X  x  x 
n-r+1  n-r+l-n-r+l-n-r+1  n  n-n-n 


(3-3-3) 


since  the 
least  one 

*0^0  = 


x. 

-l 


a . 
l 


are  orthogonal.  But  the  right  hand  side  of  (3-3-3)  is  positive  if  at 
*  0,  (i  =  n  -  r  +  1 , . . . ,n)  .  Hence  ou  *  0,  (i  =  n  -  r  +  1 , . . . ,n)  since 


Therefore  x_  =  a.x,  +  ...  +  a  x  ,  . 

-0  1-1  n-r-n-r+1 

Thus  x.  is  a  linear  combination  of  x,,...,x  ,  .  Thus  any  vector  x~  such  that 

T  -0  -I  -n-r+1  7  -0 

x^Ax„  =  0  must  be  a  linear  combination  of  x . .  ,  .  We  have  shown  that  if 

-0-0  -1  -n-r+1 

b^x„  #  0  where  x„  is  a  linear  combination  of  . . x  then  the  problem  Q  has 

-  -0  -0  -I  -n-r  r 

no  solution.  We  have  thus  proved  the  first  part  of  the  theorem:  Q  has  a  solution  only 

if  b  is  orthogonal  to  evjry  linear  combination  of  x,,...,x  -  r  .  Note  that  the 

—  in 

case  b  =  0  always  has  a  solution  x*  =  0  . 

We  next  suppose  that  b  is  orthogonal  to  every  linear  combination  of  Xj,...,xn_r 
We  can  write 

x  =  a.x,  +  . . .  +  a  x 
I  - 1  n-n 


and 


b  =  6,x,  +...+BX 

1-1  n-n 


Using  the  above  arguments  we  have 


T, 
x  Ax 


.x  •  a  .  X  ,  x  ,  x  .  + 
n-r+1  n-r+ I -n-r+ I-n-r+l 


,  T 
+  a  X  x  x 
r  n-n-n 
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Without  loss  or  generality,  we  can  assume  the  are  orthonormal.  We  get 


T  2  2 

x  Ax  =  a  A  ,+...+  a  A 

-  -  n-r+1  n-r+1  n  n 


But 


since  b  is  orthogonal  to  every  linear  combination  of  Xj,...,xn_r  , 


bx  =  ci  . .  + _ +ag 

-  -  n-r+1  n-r+1  n  n 


therefore 


f(x)  = 


n 

Za?A.  -  ct.B. 

ii  ii 


(A.  4  0) 

i 


i=n-r+l 


therefore 


B2 


min  f(x) 

x 


Z  »‘4i* ■  'Z«7  • 


i=n-r+l  i 


Hence  Q  has  a  solution  if  b  is  orthogonal  to  every  linear  combination  of  Xj,...,x^_r 
We  conclude  this  subsection  by  considering  five  examples  as  an  illustration  of  the 


above . 


(1) 


z  =  y  + 


2  •  x2  =  (x  y)  /  1  0 \  /x’ 


'/  \y 


The  minimum  of  z  is  obviously  at  the  origin  (see  Fig  1). 


Note  that  (\  0\  is  positive  definite  and  has  no  zero  eigenvalues. 


e 


(2) 


z  =  x"  +  y  =  (x  y)  /  1  0\  /x\  +  (x  y)  /  0 


o  0/  \y, 


i 


This  has  no  (finite)  minimum  (see  Fig  2).  Note  that 
is  positive  semi-definite  and  has  a  zero  eigenvalue  with  eigenvector  / 0 


c :) 


(3) 


z  =  (x2  +  y2)  +  y  =  (x  y)  / 1  0  \  ( x  \  +  (x  y )  /  0 


\0  1, 


1 


i  ii  ■  nil-  -  -  ■■i.A.V  ..tfc  -A. 
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Because 


is  positive  definite,  the  function. 


similar  to  that  of  example  (2) in  other  respects,  has  got  a  unique  minimum  (see  Fig  3). 

2  2 

We  know,  from  the  theory  of  general  f(x)  ,  that  if  A(=  d  f/dx  )  is  positive  definite 
then  there  exists  a  solution. 


This  is  an  example  of  a  quadratic  function  with  a  positive  semidefinite  matrix  that  has 
got  a  solution  (actually  an  infinite  number  of  solutions  -  see  Fig  4). 

The  eigenvector  ^0^  corresponding  to  the  zero  eigenvalue  is  orthogonal  to  b  =  / 1  \  . 


(5) 


2  2  , 
z  =  y  -  x  =  (x 


We  include  this  example  as  an  illustration  of  what  f  may  look  like  when  the  matrix  is 
non-definite  (see  Fig  5). 

4  THE  EQUATION  CONSTRAINED  PROBLEM 

In  this  section  we  consider  the  equation  constrained  problem 


E  minimize  f(x)  subject  to  q(x)  =  0 

x 


where  the  vector  equation  q(x)  =  0  represents  m  equation  constraints  of  the  type 
q^(x)  =  0  ,  all  of  which  must  be  satisfied  at  the  solution  x*  of  E  . 

Any  point  x  which  satisfies  q(x)  =  0  we  shall  call  feasible.  Suppose  x  is 
a  feasible  point.  If  x  +  Ax  is  also  feasible,  then  Ax  is  said  to  be  feasible  at  x 
or  sometimes  a  feasible  direction  at  x  .  By  a  solution  of  E  we  mean  a  feasible  point 
x*  such  that  f(x*)  <  f(x*  +  Ax)  for  all  small  enough  feasible  directions  Ax  at  x*  . 
As  before  we  assume  that  f(x)  is  continuously  twice  differentiable  and  that  f(x*)  >-*, 
Finally  we  stress  again  that  our  theory  concerns  local  solutions  to  our  problems. 

We  shall  begin  our  discussion  by  examining  the  two-dimensional  situation  and  using 
any  insight  gained  to  help  us  tackle  the  n-dimensional  problem. 

4 . 1  The  two-dimensional  problem 

The  simplest  equation  constrained  minimization  problem  is  the  two-dimensional 


E2  minimize  f(x,y)  subject  to  q(x,y)  «  0 

x.y 
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The  constraint  equation  can  be  thought  of  as  a  contour  in  the  (x,y)  plane.  Provided  the 
contour  is  not  everywhere  parallel  to  one  of  the  axes  (so  this  rules  out  q(x,y)  «  x  »  0 
for  instance)  it  is  possible  to  rewrite  the  constraint  as 


y  =  'f  (x)  . 


The  problem  E2  becomes 


El  minimize  fCx.YCx)) 


and  the  solution  of  this  problem  is  given  by 


df  _  if+3_fdv  =  0 

dx  3x  3y  dx 


(4-1-1) 


where  df/dx  is  merely  the  gradient  of  f  along  the  contour  y  =  S'(x)  ,  Suppose,  for 
3  f 

the  moment,  that  —  4  0  (but  we  shall  bear  this  assumption  in  mind  in  the  following 
dy 

discussion).  (4-1-1)  can  be  rewritten  as 


■(«)' 


(4-1-2) 


Also  we  can  write  the  equation  constraint  as 


q(x,y)  =  q (x,y(x) )  =  0 


(4-1-3) 


Differentiating  (4-1-3)  totally  with  respect  to  x  we  get 


4a  =  ia  +  is.  4z 

dx  3x  3y  dx 


(4-1-4) 


it  we  assume  4^  j  0  ,  we  can  write  (4-1-4)  as 
3y 


4Z  = 

dx 


4a  . 

3x 


(4-1-5) 


Equating  (4-1-2)  and  (4-1-5)  and  rearranging  we  get 


ii(3ar'  =  (iiw^r 

3x\3x/  \3yl\ay) 


(4-1-6) 


where  we  have  assumed  j  0  .  If  we  set  the  value  of  each  side  of  (4-1-6)  equal  to  -X 
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we  can  write 


0 


(4-1-7) 


3f 

3y 


+ 


0  . 


(4-1-8) 


Lagrange  (1736-1813)  noticed  that  (4-1-7)  and  (4-1-8)  are  simply  the  conditions  that 
are  necessarily  satisfied  by  a  stationary  point  of  the  function 

£(x,y,A)  =  f (x,y)  +  Aq(x,y)  (4-1-9) 


whilst  the  condition 


31 

3A 


q(x,y)  =  0 


(4-1-10) 


simply  incorporates  the  constraint  into  the  problem.  JC  is  called  the  Lagrangian  or 
augmented  function  of  the  problem  and  A  is  called  a  Lagrange  multiplier. 

We  seek  some  geometrical  interpretation  of  the  algebra.  Equations  (4-1-2)  and 
(4-1-5)  simply  state  that  at  the  solution  of  E2,  the  gradients  of  the  contours 

q(x,y)  =  0  (4-1-11) 

and 

f(x,y)  =  f* 

are  equal,  where  f*  is  the  value  of  f  at  the  solution  (x*,y*).  We  find  that  this 
interpretation  agrees  with  our  geometrical  intuition  (see  Fig  6).  For  if  the  gradients 
are  not  parallel,  then  the  contours  must  intersect  at  an  angle.  Except  for  the  special 
case  when  the  constrained  minimum  and  the  unconstrained  minimum  coincide,  this  must 
mean  there  are  points  on  the  constraint  contour  (4-1-11)  on  one  side  or  the  other  of 
(x*,y*)  where  f  <  f*  ,  which  is  a  contradiction. 

It  is  important  to  note  that  the  method  of  Lagrange  multipliers  may  sometimes 
be  used  when  the  problem  functions  do  not  satisfy  the  assumptions  made  in  the  above 
discussion,  namely  that  3f/3y  ,  3q/3x  and  3q/3y  are  non-zero.  These  assumptions 
were  only  made  in  the  interest  of  easing  our  derivation  of  (4-1-7)  and  (4-1-8)  and 
ensured  the  existence  of  a  unique  Lagrange  multiplier 


In  fact  the  method  of  Lagrange  multipliers  will  work  even  when  our  above 
assumptions  do  not  hold,  provided  there  exists  a  A  (not  necessarily  unique)  as  well 
as  x  and  y  which  satisfy  (4-1-7),  (4-1-8)  and  (4-1-10).  The  sufficiency  of  this 
is  proved  in  section  4.3  for  the  more  general  n  dimensional  equation  constrained  case. 
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We  do  not  bother  to  prove  it  here  because  the  reader  would  gain  little  geometrical 
insight  from  the  proof  for  the  present  (two-dimensional)  case. 

4. 2  The  n-dimensional  problem 

As  usual,  we  merely  extend  the  notation  of  the  two-dimensional  case  to  the 
n-dimensional  situation.  However,  we  must  first  make  two  assumptions  about  the  equation 
constraints 

q(x)  =  0  .  (4-2-1) 

The  first  assumption  we  make  is  that  the  feasible  set  ( ie  the  set  of  all  points  satisfying 
(4-2-1))  is  such  that  there  is  a  path  an  iterative  algorithm  can  follow  from  some  starting 
point  to  the  solution  x*  .  For  the  purposes  of  these  notes,  we  shall  express  this 
succinctly  by  saying  that  x*  is  assumed  to  be  not  isolated.  If  no  such  path  exists  the 
point  x*  is  said  to  be  isolated.  The  second  assumption  we  make  is  much  less  obvious. 

We  shall  assume  that 


rank  ■£  (x*) 


(4-2-2) 


In  otherwords,  the  gradient  vectors 


(x*) . (x*) 


are  linearly  independent.  (4-2-2)  is  called  the  Jacobian  assumption.  We  make  this 
assumption  because  (as  we  shall  see)  it  considerably  simplifies  the  general  proof  of  the 
method  of  Lagrange  multipliers.  However  it  is  important  to  note  that  in  general  the 
equation  constraints  will  not  satisfy  (4-2-2).  The  implications  of  this  are  discussed 
more  fully  in  section  4.4. 

We  wish  to  solve  the  problem  E  where  the  m  equation  constraints  q  satisfy  the 
Jacobian  assumption.  Note  that  the  q  then  also  satisfy  the  conditions  of  the  implicit 
function  theorem  (see  Appendix  B) .  Therefore  there  exists  a  vector  function  H'  such 
that  (re-ordering  the  x^  if  necessary) 


If  we  write 


Xi  =  VVl . Xn> 


^  =  (xm+l,,”’xn) 


v  =  (Xj  ,  •  •  •  ,x^) 


i  =  1 , . . . ,m. 


then  by  the  Jacobian  assumption  rank  (3q/3v)  =  m  and  our  problem  becomes 


Eu  minimize  f(Y(u),u) 


X 


881  3V 
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From  equation  (3-2-1)  a  first-order  necessary  condition  that  u*  be  a  solution  of  Eu 


is  that 


therefore 


df  =  3_f+dy3f_0 

du  3u  du  3v  - 


q(x)  =  g(f(u),u)  =  0  , 


dq  3q  dv  3q 

du  3u  du  3v 


By  our  above  rearrangement,  3q/3v  is  non-singular.  Hence  we  can  write 


dv  3q  f  ^3  \~ 

du  3u  ^Sv J 


Substituting  (4-2-5)  into  (4-2-3)  gives 


Now  because 


-  ii  (!lY  M 

du  3u  3u  \3v/  3v 

/ 3q\  (  3gV' 

\dvj  \3 vj  m 


it  follows,  by  postmultiplying  (4-2-7)  by  3f/3v  that 


li  fiiT1  3£  =  3f 
3v  y  3v )  3v  3v 


or  rearranging  we  get 


If  we  write 


sf  _  f!iY'  if 

3v  3v  \3v J  3v 

X  .  .  (hV  M 

-  \^3  v/  3v 


then  equations  (4-2-6)  and  (4-2-8)  can  be  written  as 


3f  3<i  . 
3u  +  3u  - 


3v  3v  - 


IW’T* 


(4-2-3) 


(4-2-4) 


(4-2-5) 


(4-2-6) 


(4-2-7) 


0  . 


(4-2-11) 
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But  u  and  v  are  merely  partitions  of  a  rearrangement  of  our  original  vector  x  ,  so 
that  by  recombining  u  and  v  ,  and  rearranging  if  necessary,  (4-2-10)  and  (4-2-11)  can 
be  written  succinctly  as 


3f 

3x 


0  . 


Thus  a  first  order  necessary  condition  that  x*  be  a  solution  of  E  is  that 

H  (x*)  =  0  (4-2-12) 

T 

where  £ (x)  =  f(x)  +  X  q  .  As  already  discussed  for  the  two-dimensional  case,  the 
3-C 

condition  ■jj'  (x*,X)  =  0  is  just  a  restatement  of  the  constraints  q(x)  =  0  .  Notice 

also  that  because  of  (4-2-9),  the  Jacobian  assumption  guarantees  the  existence  of  unique 
Lagrange  multipliers  X  . 

In  view  of  (4-2-12)  therefore,  the  solution  of  any  equation  constrained  problem  E  , 
satisfying  the  Jacobian  assumption,  is  also  a  stationary  point  of  the  associated  Lagrangian 
function  £  .  This  is  a  very  useful  result  because  algorithms  to  find  a  stationary  point 
of  the  equivalent  Lagrangian  problem  are  of  course  much  easier  to  design  than  algorithms 
to  solve  the  original  problem  E  . 

4 . 3  Second  order  conditions 

For  the  reader's  convenience  we  begin  this  section  by  stating  the  second  order 
necessary  and  sufficient  conditions  that  x*  be  a  (local)  solution  of  E  .  As  we  shall 
see,  they  are  very  easy  to  prove. 

The  second,  order  necessary  condition  is  that 

2 

AxT  ~  (x*)Ax  >  0  (4-3-1) 

dx 

T  dq  x 

for  all  small  enough  vectors  Ax  that  satisfy  Ax  -^(x*)  =  0  .  The  sufficient 

conditions  are 


and 


for  all  vectors 


|£(**)  =  o 


d£ 


—  (x*)  =  <j(x*)  =  o  y 


AxT  (x*)Ax  >  0 

3x 


t  dq  t 

Ax  that  satisfy  Ax  -7=-  (x*)  =  0 

—  J  —  dx  - 


(4-3-2) 


001 

001 
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It  is  left  to  the  reader  to  provide  the  parallel  diseussion  eoneermng  the  two- 
dimonsioual  ease  it  he  still  requires  geomet  rival  insight.  The  similarities  in  notation 
are  sueh  that  he  should  have  no  diffieulty. 

In  sort  ion  -i .  '  we  only  showed  that  x*  is  a  stationary  point  of  i  .  However, 
it  is  possible  that  there  are  other  stationary  points  as  well.  Thus  an  algorithm 
designed  to  oonverge  at  a  stationary  point  of  f  may  ’to eon  verge  at  x*  .  In  view  ol 
the  sutfieienev  eonditious  f-j-d-.’l,  we  see  that  if  the  algorithm  is  designed  to  eonverge 
at  a  stationary  point  x*  o!  £  sueh  that  t*- l-JX  is  also  satisfied  then  x*  is 
a  solution  ot  V  .  It  is  tor  this  reason  that  the  sutfieienev  eonditious  are  important  . 


We  derive  fi-d-lt  as  follows.  As  in  seet  ion  •< . 
the  Jaeohian  assumption,  our  problem  T  is  equivalent 
from  t  >'.’-.‘1,  a  seeoud  order  neeessary  eondition  that 


sinee  the  eonstraiuts  q  satistv 
to  the  uneon s t  r a i ne d  p r o b  1  em  1'u  . 
u*  be  a  solution  of  Ku  is 


that 


At/  — /  S  t  uA  ^  ,  u*)  An 
~~  du'  " 


>  0 


f  ■* 


ft 


tor  alt  veetors  An  .  Prom  y  A -  I A  we  have  that 


d't 

du' 


f  ■*- 


vhete  the  total  derivative  is  taken  alone,  the  eontour  u  -  uyvX  .  Hut  V.q  ftful.ul  0 

.  _  _  l  i  -  - 

tor  i  •  I . m  and  t  t  all  u  . 

Pi  t  t  i’i  ent  i  at  i  ng  twiee  we  have,  again  '.torn  fA-O,  that 


<  o. 

1 

till 

q  A  -  a.S 

i  tin 

-  vi\ 

o  •  ' 

1  l 

$  ■  l 

-  -Mi 

-  :s 

1 

d*v. 

- J. 

du* 

Add i ng 

v.,-t-.,f  and  r  <- 

f-sf  we  have 

t'J  - 

a--  r>j .  v  x 

‘U-‘  dx;  ^ 

Jjj. 

'  dx' 

Clt'I 

-\ 

- 

..  d'v. 

t f  *  \  q  X  - 

du" 

fa¬ 


x'! 


di* 


Hut  ^x*l  -  0  means  that  in  purtieul.ir  u  ♦  X^qX 


-  0  tot  i  «  1 . m 


,1'  f  d^  a 
A  ’  dTi 


it/  . 


Hioro  t  x>ve 


V  •*- 


We  have,  therefore,  proved  the  second  order  necessary  condition  since  in  view  of 
(4-3-7),  equation  (4-3-9)  implies  that 

>  „ 

dx 

for  any  vector  Ax  satisfying  (4-3-8) . 

We  now  prove  the  sufficiency  conditions  (4-3-2).  Note  carefully  that  they  imply 
the  existence  of  a  set  of  (not  necessarily  unique)  Lagrange  multipliers  A  .  For  reasons 
discussed  in  the  next  subsection,  we  prove  the  sufficiency  of  (4-3-2)  without  appealing 
tc  the  Jacobian  assumption.  We  shall  assume  that  (4-3-2)  holds  and  that  x 
a  solution  of  K  .  We  then  obtain  a  proof  by  arriving  at  a  contradiction. 


is  not 
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If  x*  is  a  feasible  point  and  not  a  solution  of  E  ,  then  since  x*  is  not 

isolated  (see  section  4.2),  there  must  exist  a  sequence  of  feasible  points 

x  =  x*  +  Ax  which  converge  to  x*  and  which  satisfy  f(x  )  <  f(x*)  ,  n  =  1,2,.., 
-n  -  -n  —  -n  - 

Let  u  be  the  unit  vector  along  the  direction  of  Ax  .  Then  the  sequence  u,,u„,.. 
-n  -n  -1—2 


is  obviously  bounded  and  Ax  =  Ax  u  where  Ax  >  0  is  the  magnitude  of  Ax  .  It 

-n  n-n  n  -n 


is  easy  to  show  that  Ax  -*■  0  as  n  -+•  00  . 

n 


It  is  well-known  that  a  bounded  sequence  has  a  convergent  subsequence  .  Hence 
U|,  u9,  ...  has  a  convergent  subsequence.  Let  this  subsequence  be  y^,  y^  ,  ...  and 


suppose  it  converges  to  y  .  Let  the  corresponding  modulus  of  each  vector  y  be  Ay 

-  -n  n 


such 


that  z  =  x*  +  Ay  y  is  a  feasible  vector, 
-n  -  n^n 


Now  since  each  z  is  feasible  we  have  that 
-n 


q(z  )  -  q (x*)  =  0  . 

2  -n  -  - 


Dividing  by  Ay^  ,  we  have 


q(x*  +  Av  y  )  -  q (x*) 
-  -  n-n  -  - 


=  o 


Now  from  Taylor's  theorem  for  vector  functions  of  vector  variables  (see  Appendix  B)  we 
have  that 


q(x*  +  Ay  v  )  ~  q  (x*) 
a  -  •  n-n  2  - 


Ay„ 


dq 


dx  (5*>i  yn  +  2^yn)  - 


where  we  interpret  the  symbol  2^Ayn)  as  a  vector  of  symbols  O(Ay^) 


On  letting 


n  -*■  »  we  see  that 


■r  dq  T 

1  dx  (-*)  =  - 


Let  be  the  Lagrange  multiplier  corresponding  to  the  constraint  q^(x)  •  Then 

from  Taylor's  theorem  for  scalar  functions  we  have  for  j  =  I , . . . ,m  that 


dq. 


0  =  A  .q .  (y  ) 

i  t  4-n 


2 

d‘q 


T  si  ?  T  h  /  31 

=  \.q.(x*)  +  X . Ay  y  — —  (x*)  +  jAy  y  X.  - —  (x*)y  +  0 [Ay  )  •  (4—3— 

iai  -  l  nln  dx  -  n-n  i  2  -  -n  \  nj 


10) 


Also 


0  >  f(z  )  -  f(x*) 

-n 


=  Av 


n-n  S  (2*}  +  iAyn^  H  +  °(Ayn) 

dx  x  / 


(4-3-11) 


A. 


Adding  (4-3-10)  for  j  =  1 , . . . ,ra  to  (4-3-11)  we  obtain 


0  >  Ay  y T  ^  (x*)  +  }Ay2yT  y  +  o(Ay3) 
^n^n  dx  -  ^n-n  d}2  in  \  JnJ 


(4-3-12) 


We  are  now  in  a  position  to  obtain  our  contradiction. 

If  the  sufficient  conditions  (4-3-2)  hold  then  -r-  (x*) 

dx  - 


Therefore  (4-3-12)  becomes 


0  > 


n,v4,  .»(ay3) 

nin  dx2  *"  V  n/ 


Multiplying  by  2/Ay“  we  get 


yT  y  +  0  (Ay  \  . 
-n  dx2  "n  ^  n' 


Letting  n  -*■  “  we  see  that 


0  >  y 


T  d2£ 


dx“ 


2  £ 


*p  dq  y 

satisfies  y  (x*)  =  0 


But  we  have  shown  chat  y 
the  sufficient  condition  and  we  have  finished  our  proof. 


(4-3-13) 


Therefore  (4-3-13)  contradicts 


4 . 4  Implications  of  the  Jacobian  assumption 

There  appears  to  be  very  little  discussion  of  the  Jacobian  assumption  in  the 
literature  (but  see  Fiacco  and  McCormi ck& ) .  Almost  always  the  assumption  is  made  without 
any  comment  or  qualification.  More  importantly,  from  the  beginner's  point  of  view,  it  is 
made  without  motivation.  But  this  motivation  is  simply  that,  as  we  have  seen,  the 
assumption  guarantees,  because  of  (4-2-9),  that  there  exist  unique  Lagrange  multipliers 
5.  such  that 


df  dq 

•P  (x*)  +  -j=  (x*)X  =  0 

dx  -  dx  -  - 


(4-4-1) 


Since  practical  problems  need  not  satisfy  the  Jacobian  assumption  it  seems  desirable 
to  explore  the  consequences  of  removing  it.  If  we  do  so,  then  we  can  no  longer  be  sure 
that  any  X  exist  (even  non-uniquely)  such  that  (4-4-1)  holds.  For  instance,  consider 
the  problem 

2  2  2 

minimize  f(x,y,z)  =  x  +  (y  -  I )  +  (z  +  1) 

2  2  2 

subject  to  q  (x,y,z)  =x+y+z-l=0 


and 


q2(x,y,z)  =  x2  +  (y  -  2)2  +  z2  -  I  =  0 
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It  is  clear  (see  Fig  7)  that  the  only  feasible  point  is  /0\ which  therefore  must  be  the 

1 

l0/ 

solution  of  the  problem.  Define  the  Lagrangian  by  £  =  f  +  A  ^  q  ^  ’  t*'en  f°r 

this  example  (A- 2-1 2)  becomes 

rf-  =  2x  +  2xA  +  2xA  =  0 

3x  1  2 


=  2y  +  2yA]  +  2(y  -  2)*2  =  0 


=  2(z  +  1)  +  2zA  +  2zA  =  0 

dz  12 


(4-4-2) 


No  values  of  Aj  and  A^  can  satisfy  (4-4-2)  at  j 0\  .  We  have  included  this  example 

1 

10/ 


to  illustrate  the  important  fact  that  even  problems  with  continuous  functions  f  and  q 
can  have  isolated  solutions.  However,  if  we  assume  the  solution  of  our  equation 
constrained  problem  E  is  not  isolated  and  that  a  further  assumption  (discussed  in  the 
proof  below)  also  holds  then  we  can  show  that  Lagrange  multipliers  must  exist  (if  not 
uniquely) . 


As  in  the  proof  of  the  sufficiency  of  (4-3-2)  we  can  construct  a  sequence  of  feasible 
points  =  x*  +  Ay^y  which  converges  to  x*  ,  where  y^  converges  to  a  unit  vector  y  . 
We  have  already  shown  that 


T  dS  ,  . 

I  (5*}  = 


(4-4-3) 


which  means  that  v  is  orthogonal  to  any  linear  combination  of  the  gradient  vectors 
dqi 

— : —  (x*)  .  Note  that  in  all  sequences  of  feasible  vectors  of  the  form  z  =  x*  +  Ay  y 
dx  -  M  -n  -  7n£n 

which  converge  to  x*  ,  the  y  will  converge  to  some  vector  y  which  satisfies  (4-4-3). 

-n 

dqi 

Let  r  be  the  number  of  gradient  vectors  -gg-  (x*)  that  are  linearly  independent, 
dqf 

By  renumbering  the  (x*)  if  necessary  we  can  assume  that  (x*) , . . .  ,~g^-  (x*)  are 

linearly  independent.  It  is  easy  to  show  that  there  exist  exactly  n  -  r  orthonormal 
vectors  v.  ,  say,  which  in  addition  are  orthogonal  to  any  linear  combination  of  the 

dqi  df 

-gg-  (x*)  .  We  now  prove  that  -gg  (x*)  is  also  orthogonal  to  each  of  these  v^  .  But 

to  prove  this  we  need  an  additional  assumption  (as  mentioned  above). 
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For  each  v.  we  construct  a  sequence  of  the  form 


z  =  x*  +  Ay  (v.  +  y  ) 
-n  -  u  ~j  -n 


(4-4-4) 


where  Ay  is  a  sequence  of  positive  numbers  converging  to  0  .  Our  additional  assump¬ 
tion  is  that  we  can  choose  the  vectors  y^  to  be  a  sequence  converging  to  0  such  that 

each  vector  z  is  a  feasible  point.  Of  course,  it  is  sometimes  not  possible  to  choose 
-ri 

such  vectors.  Consider  the  problem 


minimize 


x“  +  (y  -  l)^  +  (z  +  1)^ 


2  2 

subject  to  q^  =  x  +  y  -  1 


and 


2  2  2 

q  =x+y+z-!=0 


The  feasible  set  is  the  circle 


2  ? 

x  +  y  =1 
z  =  0 


and  the  solution  of  the  problem  is  easily  seen  to  be  j  0  \  where  the  gradient  vectors  of 

1 


\o 


Now  the  two  orthonormal  vectors  that  are  orthogonal  to  /0\  are  in  this  case 

2 

,0, 


') 

and 

.  In  particular,  for  v^  = 

1°) 

0 

0 

0 

\0 

w 

we  see  from  Fig  8  that  for  z 


to  be  feasible  the  vector  y^  has  to  have  a  length  of  at  least  one  unit  and  hence 
cannot  converge  to  0  . 

We  now  return  to  our  proof  and  assume  that  the  vectors  y^  exist  as  required  in 
(4-4-4).  Since  zn  is  bounded,  z^  (or  a  subsequence)  must  converge,  and  the  limit 
point  is  clearly  x*  . 

Now  by  Taylor's  theorem 


f ( z  )  =  f(x*)  +  Ay  (v.  +  y  )T  ^  (x*)  +  o(Ayj"]  • 

ti  n  j  —  n  ax  ^  n  j 


(4-4-5) 
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But  since  f(x*)  is  a  minimum  of  f  we  have  f(z  )  >  f(x*)  .  In  view  of  (4-4-5), 

-  -n  - 

this  means  that 

f(x*)  +  Ay  (v.  +  y  )T  —  (x*)  +  o(Ay2]  >  f(x*)  . 

”**  n  j  —  n  ux  ”  \  n  / 

Subtracting  f(x*)  from  both  sides  and  dividing  by  Ayn  >0  we  obtain 


,T  df 


(v  +  y  )  -j—  (x*)  +  0(Ay  )  >  0 

-y  in  dx  -  J  n 


(4-4-6) 


Letting  n  ■*  *•  ,  we  find  that  -*■  0  and  Ay^  -*■  0+  and  we  see  that  (4-4-6)  implies 


T  df  ,  n 

v.  —  (x*)  ?  0 

-J  dx  - 


(4-4-7) 


But  we  could  have  equally  well  chosen  Ay^  in  (4-4-4)  to  be  a  sequence  of  negative 
numbers.  Then  division  by  Ay^  reverses  the  inequality  sign  in  the  calculation  above 
and  instead  of  (4-4-7)  we  obtain 


T  df  >  .  ,  <<  — 

v .  —  ( x* )  =»  0 
-j  dx  - 


(4-4-8) 


(4-4-7)  and  (4-4-8)  together  imply  that  vT  (x*)  =  0  .  Hence  we  have  shown  that 

4-  (x*)  is  orthogonal  to  each  of  the  v.  . 
dx  -  -J 


Now  the  vectors  Vj,...,v  ^  together  with  the  r  linearly  independent  vectors 

dqi  .  .  df 

— — ■  (x*)  form  a  basis.  This  means  that  any  vector,  and  in  particular  the  vector  ~  (x*) 
dx  ,  dx  - 

dqi 

is  a  linear  combination  of  the  v.  and  the  — ; — (x*)  .  That  is,  there  exist  numbers 

-j  dx 

a,,. ..,a  and  X,,...,X  such  that 
1  n-r  1  r 


df 

dx 


(x*) 


dq 


+  a  v  +  X,  ~r~  (x*)  +  . 
n-r  n-r  1  dx 


dq 

,  .  +  X  — r—  (x*). (4-4-9) 
r  dx 


df  Hi 

But,  since  the  Vj  are  orthonorraal  and  are  orthogonal  to  (x*)  and  every  -gj-  (x*)  , 

by  premultiplying  (4-4-9)  by  vT  (j  =  1 , . . . ,n-r)  it  is  easy  to  see  that 

a.  =  0  (i  =  l,...,n-r)  . 

J 

If  we  also  put  X  ,=...=  A  =  0  ,  we  see  that  we  have  established  the  existence 
v  r+1  n 

of  a  set  of  numbers  X .  satisfying  (4-4-1).  Thus  the  X.  are  Lagrange  multipliers  and 
we  have  finished  our  proof. 

It  follows  that,  in  general,  practical  problems  will  have  X  satisfying  (4-4-1). 
Hence  we  have  also  proved  that,  in  general,  an  algorithm  that  solves  the  equation  con¬ 
strained  problem  E  by  finding  a  X  to  satisfy  the  sufficient  conditions  (4-3-2)  will 
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be  successful.  The  corollary  is,  of  course,  that  such  an  algorithm  must  have  in-built 
safeguards  to  prevent  it  from  giving  misleading  results  in  cases  (such  as  our  example) 
where  no  Lagrange  multipliers  exist. 

5  THE  KUHN-TUCKER  CONDITIONS 

In  this  section  we  shall  consider  the  general  constrained  optimization  problem  G  , 
where  there  are  constraints  of  both  equation  and  inequality  type. 


G  minimize  f(x)  subject  to  q(x)  =  0 

-  and  c(x)  0 


where  c(x)  is  an  m'  *  1  column  vector  valued  function  of  x  . 

We  have  postponed  the  derivation  of  the  necessary  and  sufficient  conditions  of  this 
probLem  until  after  discussing  the  equation  constrained  problem  E  .  The  reader  may  feel 
that  this  is  because  the  conditions  for  the  problem  G  are  more  difficult  to  derive 
than  for  any  of  the  problems  mentioned  earlier.  This  is  not  so,  for,  as  we  shall  see, 
the  general  constriined  problem  G  can  be  quite  readily  transformed  into  the  problem  E  . 
Hence  if  we  can  derive  necessary  and  sufficient  conditions  for  E  ,  we  can  also  do  so 
for  G  .  It  is  for  this  reason  that  we  have  left  our  discussion  until  this  stage.  As 
always  we  shall  best  proceed  by  considering  related  but  simpler  problems. 

5 . 1  A  one-dimensional  problem 

We  consider  first  the  problem 


7.1  minimize  f(x)  subject  to  x  >  0 
x 


To  derive  a  first  order  necessary  condition,  we  proceed  as  for  the  unconstrained  problem 
UI  ,  by  expanding  f(x)  by  Taylor's  series  about  a  local  solution  x*  , 

f(x*  +  Ax)  =  f(x*)  +  Ax  (x*)  +  0(Ax2)  .  (5-1-1) 

dx 

Now  x*  is  a  local  solution  of  Z!  means  that 

f(x*  +  Ax)  >  f(x*)  (5-1-2) 

for  all  small  enough  Ax  satisfying  the  constraint  x*  +  Ax  >  0  .  Using  (5-1-1)  to 
eliminate  f(x*  +  Ax)  from  (5-1-2)  we  obtain 

f(x*)  +  Ax  ~  (x*)  +  0(Ax2)  ^  f(x*) 

or 

Ax  ~  (x*)  +  0(Ax2)  >  0  .  (5-1-3) 
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Note  that  for  x*  to  be  a  solution  of  Z1  ,  also  implies  that  x*  satisfies  the 
constraint,  ie  that  x*  >  0  .  If  x*  =  0  ,  then  x*  +  Ax  satisfies  the  constraint  only 
if  Ax  3*  0  .  Dividing  (5-1-3)  by  Ax  >  0  and  letting  Ax  ■+  0+  ,  we  obtain 


df 

dx 


(x*)  >  0 


(5-1-4) 


but  if  x*  >  0  ,  then  x*  +  Ax  satisfies  the  constraint  if  Ax  >  -  x*  ,  Dividing  (5-1-3) 
by  such  Ax  <  0  and  letting  Ax  -»  0-  we  get 

4^  (x*)  <  0  .  (5-1-5) 

dx 

If  x*  >  0  ,  x*  +  Ax  will  also  satisfy  the  constraint  if  Ax  >  0  7>  -  x*  .  Hence 
(5-1-4)  will  also.  Since  (5-1-4)  and  (5-1-5)  both  hold,  we  have  that 

~  (x*)  =  0  .  (5-1-6) 


We  sunmarize  the  above.  A  necessary  condition  that  x*  be  a  solution  of  Z1  is 
that 

<0  if  x*=0  (5-1-7) 

=0  if  x*  >  0  .  (5-1-8) 

It  is  customary  to  abbreviate  (5-1-7)  and  (5-1-8)  into  the  one  condition 

x*  4^  U*)  =  0  •  (5-1-9) 

dx 

Since  x*  must  also  satisfy  the  constraint  x*  >  0  ,  we  readily  see  that  (5-1-9)  is  in 
fact  equivalent  to  (5-1-7)  and  (5-1-8). 

The  geometrical  interpretation  of  (5-1-9)  is  straightforward,  though  perhaps  not 
obvious  to  the  beginner.  (5- 1-9)  simply  states  that  x*  is  either  on  the  constraint 
( ie  x*  =  0)  or  it  is  not.  If  x*  is  not  on  the  constraint,  then  the  constraint  in 
no  way  restricts  x*  and  hence  our  problem  Z1  is  equivalent  to  the  unconstrained 
problem  U)  .  If  x*  does  lie  on  the  constraint  boundary,  then  we  have  solved  the 
problem  Z1  and  x*  =  0  . 

5 . 2  A  special  n-dimensional  problem 

We  can  readily  extend  the  method  of  section  5.1  to  deal  with  the  problem. 

Z  minimize  f(x)  subject  to  x  >  0 
x 


and 


df 

dx 

df 

dx 


(x*) 


(x*) 
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We  expand  f(x)  about  a  local  solution  x* 


f(x*  +  Ax)  =  f(x*>  +  Ax^  ^  (x*)  +  0|Ax]2 


Now  x*  is  a  local  solution  of  Z  so  that  f(x*  +  Ax)  >  f(x*)  for  all  small  enough 
feasible  _Ax  at  x*  .  In  this  case  Ax  is  feasible  means  that  x*  +  Ax  >  0  .  We 
obtain  in  the  usual  manner  that 


Ax^  4^  (x*)  +  OjAxj2  >  0 


(5-2-1) 


In  particular,  (5-2-1)  holds  for  Ax  =  Axe^  where  e^  is  the  ith  unit  co-ordinate  vector 
and  Ax  is  the  magnitude  of  Ax  . 


Ax.  ~  (x*)  +  0(Ax. ) 2  >  0  . 

l  3x.  -  X 

l 


(5-2-2) 


Now  (5-2-2)  is  analogous  to  (5-1-3)  and  we  can  follow  exactly  the  same  procedure  as  in 
section  5.1  to  obtain  results  equivalent  to  (5-1-7)  and  (5-1-8),  namely 


£-<-**>  »  » 
1 


if  x* 

l 


(5-2-3) 


.  o 

1 


if  x*  >  0 
x 


(5-2-4) 


As  in  section  5.1,  we  abbreviate  these  to 


x*  -if-  (x*) 


(5-2-5) 


We  can  repeat  the  same  procedure  for  all  i  .  Adding  the  n  equations  of  the  form 
(5-2-5)  together  we  obtain 


x*T  ^4  (X*)  =  o  . 

-  dx  - 


(5-2-6) 


Note  that  this  condition  is  in  fact  equivalent  to  (5-2-5)  because  x*  satisfies  the 
conditions  (5-2-3)  and  (5-2-4).  Hence  (5-2-6)  is  also  equivalent  to  (5-2-3)  and 
(5-2-4)  and  is  thus  a  first  order  necessary  condition  that  x*  be  a  solution  of  Z  . 

5. 3  Active  constraints  and  the  Jacobian  assumption 

Let  x*  be  a  solution  of  the  general  constrained  problem  G  .  Then  for  each 
inequality  constraint  c.(x)  <0  ,  x*  either  lies  on  the  constraint  boundary 
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(ie  c.(x*)  =  0)  or  it  does  not.  If  x*  does  lie  on  the  boundary,  the  constraint  is 

said  to  be  active. 

By  rearranging  the  inequality  constraints  if  necessary,  we  can  partition  the  vector 
T  /  T  .  T  \ 

c  (x)  into  |c^(x)  cg(x)j  where  c^(x)  ^  2  denotes  the  active  constraints  and 
cD(x)  <  0  denotes  the  not  active  constraints. 

As  we  did  for  the  equation  constrained  problem,  to  ensure  the  existence  of  unique 
Lagrange  multipliers  we  shall  have  to  assume  that  the  columns  of 


are  linearly  independent.  This  is  the  Jacobian  assumption  for  the  general  constrained 
problem  G  . 

5.4  The  general  constrained  problem 


The  first  order  necessary  conditions  that  x*  be  a  local  solution  of  G  are 
as  follows.  There  exists  an  m'  *  1  vector  p  and  an  m  x  ]  vector  X  such  that 


&  (X*)  .  a  (**>*  *  §  <**>»  -  0 


and 


u  c(x*)  =  0 


v  >  0  . 


(5-4-1) 


(5-4-1)  are  called  the  Kuhn-Tucker  conditions.  The  vectors  X  and  u  are  called 
Lagrange  multipliers.  (The  u  are  sometimes  called  Kuhn-Tucker  rrtultiplievs  to  emphasise 
their  being  distinct  from  the  X.)  We  can  immediately  derive  (5-4-1)  from  the  results 
we  have  obtained  earlier. 

Now  x*  is  also  a  solution  of  the  problem 

C  minimize  f(x)  subject  to  q(x)  =  0  and  c^(x)  =  2 

x 


at  least  in  a  small  enough  region  around  x*  .  For  if  x*  is  a  local  solution  of  G 
then 

f(x*)  <  f(x*  +  dx)  (5-4-2) 


for  all  small  enough  dx  such  that  q(x*  +  dx)  =  0  and  c(x*  +  dx)  <  0  .  Now  since 
c(x)  is  continuous  and  cfi(x*)  <  0  ,  then  by  the  intermediate  value  theorem  (see 
Appendix  B)  Cg(x*  +  Ax)  ^  0  for  all  small  enough  dx  .  Hence,  provided  dx  is  small 
enough,  x*  +  dx  will  automatically  satisfy  the  non-active  constraints.  Hence  (5-4-2) 
will  hold  for  all  small  enough  dx  such  that  <j(x*  +  dx)  -  0  and  £A(x*  +  Ax)  ^  0  . 
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In  particular,  (5-4-2)  must  hold  for  all  small  enough  Ax  such  that  c^(x*  +  ”  0  • 

Hence  x*  is  also  a  solution  of  C  . 

But  C  is  an  equation  constrained  problem,  whose  first  order  necessary  conditions 
are  given  by  (4-2-12).  The  Lagrangian  for  problem  C  can  be  written 


£(x)  =  f(x)  +  XTq(x)  +  vTc  (x) 


where  v  is  a  column  vector  of  appropriate  length. 
Hence  if  x*  is  a  solution  of  C  then 


df  tlq  dc 

i<2*>  *  £  *  -3T 


f(x.)  . 

dx  - 


dq  dc 

~  77  (5*>* 


(5-4-3) 


We  now  show  that  v  >  0  .  Suppose  instead  that  v^  <  0  for  at  least  one  j 
Let  dx  be  a  feasible  vector  at  x*  .  Note  that  we  can  choose  dx  such  that 


c^(x*  +  dx)  =  0 


Cj  (x*  +  dx)  <  0 


since  otherwise  Cj (x)  is  functionally  dependent  on  the  other  <—(5)  • 

c.  =  Me say.  Differentiating  with  respect  to  x  gives 
J  -A 


dc.  dc 

_  _  -A  _dj> 

dx  dx  dc. 

-  -A 


We  see  that  dc./dx  is  a  linear  combination  of  the  other  constraint  gradients.  In 
particular,  at  x*  this  contradicts  the  Jacobian  assumption.  Now 


q.(x*  +  dx)  -  q^(x*)ll^  =  dx  A^  +  Ojdx| 


for  i  =  I ,  . .  .  ,m  .  Thus 


T  dq. 

-  dx  ~  X.  =  0 1  dx  I  . 

dx  1  1  -1 


Similarly 


‘  d*T  77  vi  "  0|d^l 


2 


for  i  4  j  . 
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Now  from  (5-4-3)  we  have  that 


,  T  df 
dx  -r-  = 

-  dx 


V  T  dqi  V  T  dci  T  dCi  , 

-  )  dx  — —  A.  -  )  dx  —j —  v.  -  dx  *■  v.  +  0  dx 
/  ,  -  dx  i  /  ,  -  dx  i  -  dx  j  1  - 


dc . 


dx  — r*t  v  .  +  0  dx 
-  dx  j  1  - 


2 


This  can  be  written  as 


df 


-  dc . v . 
J  J 


+  0 | dx | ^ 


But  since  dx  is  feasible  we  must  have  that  dc .  <0  .  Also  v.  <  0  ,  therefore 

J  J 

k  >  0  where  k  =  -  dc.v.  .  Thus 

J  J 

df  =  k  +  0 | dx | 2  . 

Since  k  is  of  order  |dx|,  and  the  functions  of  interest  are  continuous,  we  can  find 
dx  small  enough  so  that  df  <  0  ,  which  contradicts  our  assumption  that  x*  is  a  mini¬ 
mum.  Hence  by  reduato  ad  absurdum,  v  >  0  . 

T  T 

Define  the  m'  x  1  column  vector  p  by  jj  =  (y  0)  .  Then  rearrange  jj  so 
that  p.  =  0  if  c.  is  not  active  and  p .  >  0  if  c.  is  active.  As  in  sections  5.1 

it  li 

and  5.2  we  can  abbreviate  this  to 


p^c(x*)  =  0 


(5-4-4) 


because  p  >  0  and  c(x*)  <  0  .  Also 


^A 

dx 


v 


fSA 

dx 


(x*)v  +  — —  (x*)0 


dc 

dx 


(x*)p 


(5-4-5) 


Substituting  (5-4-5)  into  (5-4-3)  we  have 


df 

dx 


(x*) 


dq 

+  -TT  (x*)i 


(x*)p 


0 


We  have  thus  derived  the  Kuhn-Tucker  conditions.  We  see  that  the  middle  condition  (5-4-4) 
is  just  an  abbreviation  of  the  restriction  that  the  multipliers  corresponding  to  non¬ 
active  constraints  must  be  zero.  (5-4-4)  is  sometimes  called  the  complementary  slackness 
condition. 
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5 .5  Second  order  conditions 

As  usual  we  state  the  conditions  first.  The  second  order  condition  that  x*  be 
a  solution  of  G  is  that 

2 

AxT  (x*)Ax  >  0 

dx 

for  all  vectors  Ax  satisfying 

rp  del  T1 

A?  ^  (**>  =  2  , 

where  a(x)  represents  the  vector  of  equation  and  active  constraints. 

The  sufficient  conditions  are 


~  (x*)  =  0 

dx  - 

and  there  exist  u  >  0  such  that 

jjTc(x*>  =  0 

and 

2 

AxT  (x*)Ax  >  0 
dx 


for  all  sufficiently  small  vectors  Ax  satisfying 


t  da  t 

Ax  (x*)  =  0 

—  dx  - 


(As  before,  the  sufficient  conditions  imply  the  existence  of  Lagrange  multipliers.) 


The  proofs  follow  exactly  those  of  the  analogous  conditions  of  the  problem  E  , 
except  that  everywhere  active  inequality  constraints  are  treated  as  equation  constraints. 
The  non-active  constraints  only  occur  in  the  Lagrangian,  where  they  are  multiplied  by 
the  zero  entries  of  y  . 


6  THE  DUALITY  THEOREM 

We  complete  these  notes  with  a  statement  and  derivation  of  the  duality  (or 
Kuhn-Tuoker)  theorem.  This  important  theorem  underlies  most  numerical  methods  of 

constrained  optimization. 

6 . 1  The  dual  function 

We  define  the  dual  function  $(X,y)  for  the  general  constrained  problem  G  by 


<(>(A,p) 


min)f(x)  +  XTq(x)  +  jjTc(x) 
x  ( 


i 

e 

e 

c 
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Let  x*  be  the  vector  which  minimizes 

£(x)  +  X*Tq(x)  +  P*Tc(x) 

where  p*  >  0  . 

Note  that 

<f>(A*,P*)  =  f(x*)  +  X*Tq(x*)  +  u*Tc(x*)  . 


6 . 2  Statement  of  the  theorem 
x*  is  a  solution  of  G 


(i)  if  <f>(X*,y*)  >  <j>(A,u)  for  all  X  and  for  all  p  >  0  , 

2  2 

(ii)  only  if  (provided  the  matrix  3  £/3x~  is  everywhere  positive  definite)  there 

exist  A*,  p*  which  maxim. '.<3  <j>(A,p)  for  all  X  and  for  all  p  >  0  . 

6 . 3  Proof  of  part  (i) 

Let  p*  >  0  and  X*  be  vectors  which  maximize  <J>(A,p)  for  all  X  and  for  all 
P  >  0  .  Then  X*,p*  must  satisfy  the  Kuhn-Tucker  conditions  for  the  maximization  problem 


Denote  the  inequality  constraints  p  >  0  by  y(p)  *  -  p  <  0  .  Let  the  multipliers 

associated  with  y(p)  be  oj  .  Then  since  there  are  no  equation  constraints,  (5-4-1) 


becomes 


and 


(6-3-1) 
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3$ 

3X 


=  0 


3<J> 


=  -  a  <  0 


=  0 


-  T  T 

Let  x(X,y)  be  the  vector  which  minimizes  f(x)  +  X  cj(x)  +  y  c(x) 


Then 


Now 


<t(X,y)  =  f^x(X,u)|  +  XTq^x(X,y)j  +  yTc|x(X,u)j 


3<(>  3f  3*  3q  3H  3£ 

3X  ~  3X  +  3X  2  +  3X  -  +  3X  -  +  3X  - 


3f  3<3  3c 

3X  +  3  +  3X  i  +  3l  a 


3x  df  3x  dq  3x  dc 

IX  dx  3X  dx  -  3X  dx  -  3 


where  we  have  used  (A-2) .  Therefore 


3$ 


d?  (dt 


dq  dc 


3X  -  dX  Vdx  '  dx  *  +  di  H]  +  q(5>  • 

df  dq  dc 

At  any  point  x  which  minimizes  C  we  have  -7-  +  -?=■  X  +  —  y  =  0 
1  *  dx  dx  -  dx  - 


Therefore 


11 

3  X 


=  q(x) 


Similarly, 


ii 

3y 


XT  =  c(x) 


In  particular,  (6-3-3)  and  (6-3-4)  must  hold  for  x*  =  x(X*,y*)  . 
Substituting  them  into  (6-3-2)  we  obtain 

q(x*)  =  0 

c ( x* )  <  0 

yTc(x*)  =  0  . 

(6-3-5)  show  that  x*  satisfies  the  constraints  of  the  problem  G  . 


(6-3-2) 


(6-3-3) 


(6-3-4) 


(6-3-5) 


(6-3-6) 
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Let  x  be  any  point  that  also  satisfies  the  constraints.  Then  by  definition  of 


f(x*)  +  X*Tq(x*)  +  y  Tc(x*)  <  f(x)  +  1  ^q(x)  +  jj  ^c(x)  . 


(6-3-7) 


*T 


But  q(x*)  and  q(x)  are  zero.  From  (6-3-6),  y  c(x*)  is  also  zero.  Since  c(x)  <  0 


*T  , 

and  jj*  >  0  we  have  y  c(x)  ^  0  .  Hence  we  get 


f(x*)  =  f(x*)  +  X  ^q(x*)  +  y  Tc(x*) 


*T 


<  f(x)  +  X  Lq(x)  +  y  Tc(x)  =  f(x)  +  y  1c(x)  <  f(x) 


*T 


Hence  x*  is  a  solution  of  the  problem  G  . 

6.4  Proof  of  part  (ii) 

2  2 

Let  x*  be  a  solution  of  G  and  let  3  X/3x  be  positive  definite  everywhere. 
At  any  point  x(\,y)  which  minimizes  X  we  must  have 

df  d9  dc 

+  -r=  X  +  y  =  0 

dx  dx  -  dx  - 

which  can  be  written 


i=l  i= 1 


(6-4-1) 


Differentiating  with  respect  to  gives 


3 

3X  . 


iii.y  ,  jcfii). 

dx/  [_ j  i  3\.  \dx/  3X.  y  j  dxj  i  3^.  \  dx  / 

i^j  J  J  i=l  J 


(6-4-2) 


In  view  of  (A-3) ,  (6-4-2)  becomes 


tffV  JL  V  x  [d  qj j  ii  dq’ 

Wx2/  3Xj  +  L Ji  i  Idx2 


+  — — 1  +  X  . 
dx  j 


/d"q.\  3  x 

2  1  3X. 


\dx 


J 


=  0 


i=  1 


which  can  be  further  simplified  to 


3x  dq; 


3X  . 

J 


dx 


=  0 


(6-4-3) 
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For  j  =  the  row  vector  equations  (6-4-3)  can  be  combined  to  obtain  the  matrix 

equation 

do 

(6-4-4) 


2  2 

Since  d  JC/dx  is  positive  definite  it  has  an  inverse.  Hence  (6-4-4)  may  be  written 


3x 

3l 


\d£2/ 


(6-4-5) 


Similarly,  by  differentiating  (6-4-1)  by  we  obtain 


3x 

3y 


\d~2 ) 


(6-4-6) 


Putting 


and  a 


where  c,  is  the  vector  of  active  constraints,  we  have 
-A 


and 


I  32<)> 

:  aM 

2 

d  <J> 

3X2 

:  3X3y 

,  2  ' 
dv 

32^ 

•  a2* 

\Sy3X 

:  3e2 

(6-3-4) 

we  have 

3X 2 

3  / 

Tx  ml 

aT  [a 

^t(X,y)j 

d2<p 

3X3y 

.  3  IH\  _ 

3X  \du) 

4D 

(x(X,y)] 

dq 
3X  dx 


3x  dc 
3X  dx 


(6-4-7) 


(6-4-8) 


Similarly  we  have 


and 


3  2  <p 

3x  dq 

3y  3X 

3y  dx 

324i 

3x  d£ 

3_y2 

3y  dx 

(6-4-9) 


(6-4-10) 
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Therefore 


/  dS 

3x  dg  \ 

d  2  d> 

ax  dx 

3A  dx 

i  9x  dq 

3x  dc 

\3p  dx 

3p  dx  | 

Substituting  in  (6-4-5)  and  (6-4-6)  we  get 

I  /  o  \ 


Hence 


dv 


v-1 


dq 

dx 

-(Si 

fd2r' 

dx2i 

‘’dc 

dx 

dq 

-  M  i 

_  *  dc 

dx 

\dx  I 

.  2 

dx 

\  _  / 

dx 

d20  _  _  /dd'f  &  dg 

dv2  \d^i  dx2  d? 


(6-4-11) 


2  2 

Since  da/dx  is  of  full  rank  and  d  £/dx  is  positive  definite,  (6-4-11)  implies  that 
2  2  ~  " 

d  0/dv  is  negattve  definite. 

Define 


to  be  the  Lagrange  multipliers  corresponding  to  x*  .  We  shall  show  that  v*  satisfies 
all  the  sufficient  conditions  to  maximize  4> ( v)  for  all  u  >  0  and  for  all  \  . 

Let  1  be  the  Lagrange  multipliers  of  this  problem.  Then  the  Lagrangian  is 
T 

L  =  <j>  -  y  U  .  Now  from  (6-3-3)  and  (6-3-4)  we  have  that 


3L 

d\ 

30 

3X 

=  q (x*)  = 

0 

dL 

3p 

30 

3p 

-  y  =  c(x*) 

-  1  ■ 

(6-4-12) 

We  set  y.  ■  0  if  c^(x*)  =  0  and 

y.  = 

i 

-  c^(x*)  if 

c^x*)  <  0  . 

Then  y  >  0  . 

Also,  because  p  is  itself  a  Kuhn-Tucker  multiplier,  we  have  that  p^  =  0  if 
c.(x*)  =  0  and  p.  >  0  if  c^(x*)  <  0  .  Hence  y.  =  0  if  p..  =  0  and  y^  >  0  if 
p^  >  0  .  We  can  express  this  as  y^p  =  0  .  From  (6-4-12)  we  therefore  have  that 
dL/dy  *  c(x*)  ~1=0.  Hence  dL/dp  =  0  and  therefore  v*  satisfies  the  sufficient 
conditions  and  we  have  finished  our  proof. 
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Appendix  A 

STANDARD  RESULTS  OF  VECTOR  DIFFERENTIATION 
Apart  from  the  results  discussed  in  section  2.2,  the  following  are  also  used  in 


these  notes. 


T  (  T  .  t\ 

(1)  Let  x  =  !v  1  u  )  and  suppose  f  «  f(u,v)  • 

Then  the  second  total  derivative  of  f  along  the  contour  v  =  v(u)  is  given  by 


d2f  _  d*  d2f  Ajsf  V'  _3f_ 

.2  ~  du  ,  2  \du  i  +  /  ,  3v. 

du  -  dx  V  -  /  L—j  j 


(2)  Let  f  =  f(x)  and  suppose  x  =  x(u)  ,  for  some  vector  u  .  A  chain  rule 
applies  in  the  form 

df  _  ^  df 
du  du  dx 

(3)  Let  f  be  an  m-vector  function  of  n  variables  x  .  Let  the  x  also 
depend  on  a  scalar  t  .  Then  f  is  implicitly  a  function  of  t  and  the  chain  rule 


/df\T  dx 
\dx/  dt 


PRECEDING  PaGS  BLANK-NOT  FILinLD 
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Appendix  B 

STATEMENTS  OF  THEOREMS  ASSUMED  IN  THE  TEXT 
3 

Implicit  function  theorem 

Let  u(x)  be  m  continuously  differentiable  functions  of  n  variables  x  (m  <  n) . 


rank 


m 


then  it  is  possible  to  solve  for  m  of  the  variables,  say  x^ 

remaining  n  -  m  variables  x  x  , 

m+ 1  n 

ie 


x. 

l 


Y.(x 

l  m+1 


'••V 


,x  ,  in  terms  of  the 


(i  =  1 . m)  . 


The  m  functions  V  are  called  implicit  functions . 

B. 2  Mean  value  theorem 

Both  the  mean  value  theorem  and  the  second  mean  value  theorem  can  be  derived  from 
3 

the  fundamental  inequality  . 

Let  f  be  a  differentiable  function  of  n  variables  x  .  Then  there  exists 
a  number  5  satisfying  0  <  F,  <  1  such  that 


AxT  4-  (x  +  SAx)  =  f(x  +  Ax)  -  f(x)  . 


B. 3  Second  mean  value  theorem 

Let  f  be  a  twice  differentiable  function  of  n  variables  x  .  Then  for 
all  5  >  0 

2 

(x  +  £Ax)  =  (x)  +  £  ^-|  (x)Ax  +  OU2)  . 

dx 

B . 4  Taylor's  theorem  for  vector  functions  of  vectors 

Let  f  be  a  column  of  m  differentiable  functions  of  an  n-vector  x  . 

i  1 2 

Ax  +  0 | Ax | 

8*5  Taylor’s  theorem  for  scalar  functions  of  vectors 

Let  f  be  a  scalar  valued  and  at  least  twice  differentiable  function  of  an 
n-vector  x  .  Then 

f(x  +  Ax)  =  f(x)  +  (4^  J Ax  +  j AxT  — |  Ax  +  0 1 Ax | 2  . 

'  -  '  dx 


Then 


f(x  +  Ax)  =  f(x)  + 


% 
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Note  that 

/df\T.  .  T  df 

^dx)AS  “  Ax  ^  . 

B.6  The  intermediate  value  theorem^ 

We  state  the  n-dimensional  analogue  of  this  well-known  theorem"*  in  the  following 

form. 

Let  f  be  a  continuous  function  of  n  variables  x  .  Let  f(Xj)  <  0  and 
fCx^)  >  0  .  Then  there  exists  a  point  £  lying  on  the  line  segment  joining  x^  and 
^2  such  that  f(£)  =  0  . 
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a(x) 

Uij) 

(A)ij 

c(x) 

df 

dx 

d2f 

dx 


f  (x) 
f(x) 
I 

n 

L 

£ 


0 

gT 

0(Axn) 
0 ( Ax| n 

[0,1] 

a(x) 

T 

x 

X* 

X 

X* 

H 

y* 
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LIST  OF  SYMBOLS 

vector  of  active  constraints 
matrix  whose  i,j-entry  is  a^ 
the  i,j-entry  of  matrix  A 
vector  of  m'  inequality  constraints 

the  gradient  vector,  eg  the  column  vector  of  first  partial  derivatives  of  the 
scalar  function  f 

the  Hessian  matrix,  eg  the  matrix  of  second  partial  derivatives  of  the 
scalar  function  f 

a  scalar  function  of  the  n  variables  x 
a  vector-valued  function  of  the  n  variables  x 
the  n  x  n  identity  matrix 
the  Lagrangian  of  the  dual  function 
the  Lagrangian  function 

£  =  f  +  X^q  for  the  equation  constrained  problem 

£  =  f  +  XTq  +  yTc  for  the  general  constrained  problem 

column  vector  of  zero  entries 

row  vector  of  zero  entries 

terms  of  order  Ax°  and  higher  terms 

terms  of  order  |Ax|n  and  higher  terms 

the  closed  interval  of  numbers  between  0  and  1  ,  ie  the  interval  of 
numbers  x  such  that  0  <  x  <  1 

vector  of  m  equation  constraints 

symbol  of  vector  or  matrix  transposition 

column  vector  of  (not  necessarily  n)  variables 

solution  of  the  particular  minimization  problem  under  discussion 
column  vector  of  Lagrange  multipliers 

column  vector  of  Lagrange  multipliers  corresponding  to  x*  in  duality  theorem 
column  vector  of  Kuhn-Tucker  multipliers 

column  vector  of  Kuhn-Tucker  multipliers  corresponding  to  x*  in 
duality  theorem 

!T  T 

f(x)  +  ^  q(x)  +  u  c(x) 

definition  symbol,  ie  the  left  hand  side  of  the  equation  is  defined  by  the 
right  hand  side 
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